Moving Beyond Downstream Task Accuracy for Information Retrieval Benchmarking

Santhanam, Keshav; Saad-Falcon, Jon; Franz, Martin; Khattab, Omar; Sil, Avirup; Florian, Radu; Sultan, Md Arafat; Roukos, Salim; Zaharia, Matei; Potts, Christopher

Computer Science > Information Retrieval

arXiv:2212.01340 (cs)

[Submitted on 2 Dec 2022]

Title:Moving Beyond Downstream Task Accuracy for Information Retrieval Benchmarking

Authors:Keshav Santhanam, Jon Saad-Falcon, Martin Franz, Omar Khattab, Avirup Sil, Radu Florian, Md Arafat Sultan, Salim Roukos, Matei Zaharia, Christopher Potts

View PDF

Abstract:Neural information retrieval (IR) systems have progressed rapidly in recent years, in large part due to the release of publicly available benchmarking tasks. Unfortunately, some dimensions of this progress are illusory: the majority of the popular IR benchmarks today focus exclusively on downstream task accuracy and thus conceal the costs incurred by systems that trade away efficiency for quality. Latency, hardware cost, and other efficiency considerations are paramount to the deployment of IR systems in user-facing settings. We propose that IR benchmarks structure their evaluation methodology to include not only metrics of accuracy, but also efficiency considerations such as a query latency and the corresponding cost budget for a reproducible hardware setting. For the popular IR benchmarks MS MARCO and XOR-TyDi, we show how the best choice of IR system varies according to how these efficiency considerations are chosen and weighed. We hope that future benchmarks will adopt these guidelines toward more holistic IR evaluation.

Subjects:	Information Retrieval (cs.IR); Computation and Language (cs.CL)
Cite as:	arXiv:2212.01340 [cs.IR]
	(or arXiv:2212.01340v1 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2212.01340

Submission history

From: Keshav Santhanam [view email]
[v1] Fri, 2 Dec 2022 17:57:06 UTC (110 KB)

Computer Science > Information Retrieval

Title:Moving Beyond Downstream Task Accuracy for Information Retrieval Benchmarking

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:Moving Beyond Downstream Task Accuracy for Information Retrieval Benchmarking

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators